Fast Packed String Matching for Short Patterns
نویسندگان
چکیده
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In this model an algorithm operates on words of length w, grouping blocks of characters, and arithmetic and logic operations on the words take one unit of time. In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns. From our experimental results it turns out that, despite their quadratic worst case time complexity, the new presented algorithms become the clear winners on the average for short patterns, when compared against the most effective algorithms known in literature.
منابع مشابه
Towards a Very Fast Multiple String Matching Algorithm for Short Patterns
Multiple exact string matching is one of the fundamental problems in computer science and finds applications in many other fields, among which computational biology and intrusion detection. It turns out that short patterns appear in many instances of such problems and, in most cases, sensibly affect the performances of the algorithms. Recent solutions in the field of string matching try to expl...
متن کاملFast Searching in Packed Strings
Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P . The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in ...
متن کاملOptimal Packed String Matching
In the packed string matching problem, each machine word accommodates α characters, thus an n-character text occupies n/α memory words. We extend the Crochemore-Perrin constantspace O(n)-time string matching algorithm to run in optimal O(n/α) time and even in real-time, achieving a factor α speedup over traditional algorithms that examine each character individually. Our solution can be efficie...
متن کاملPacked Compact Tries: A Fast and Efficient Data Structure for Online String Processing
In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in n log σ+O(k log n) bits of space and supports fast pattern matching queries and updates, where σ is the size of an alphabet. Assume that α = log σ n letters are packed in a single machine word on the standard word RAM model, and let f(k, n) denote ...
متن کاملA Fast Algorithm for Approximate String Matching on Gene Sequences
Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch p...
متن کامل